Enhancing children's speech recognition under mismatched condition by explicit acoustic normalization
نویسندگان
چکیده
Most commonly used model adaptation techniques employ linear/affine transformation on models/features to address the gross acoustic mismatch between the adults’ and the children’s speech data. Since all sources of acoustic mismatch may not be appropriately modeled by just linear transformation, in this work, the efficacy of our recently proposed explicit acoustic (pitch and speaking rate) normalization in combination with the existing normalization/adaptation techniques is explored for mismatched children’s speech recognition. The study shows that explicit normalization of pitch and speaking rate of children’s speech further improves the effectiveness of the adaptation methods. With explicit acoustic normalization significant relative improvements of 13% and 5% are obtained over that obtained with combined VTLN and CMLLR for children’s speech recognition on adults’ speech trained models for connected digit and continuous speech recognition tasks, respectively.
منابع مشابه
Exploring the Effect of Differences in the Acoustic Correlates of Adults' and Children's Speech in the Context of Automatic Speech Recognition
This work explores the effect of mismatches between adults’ and children’s speech due to differences in various acoustic correlates on the automatic speech recognition performance under mismatched conditions. The different correlates studied in this work include the pitch, the speaking rate, the glottal parameters (open quotient, return quotient, and speech quotient), and the formant frequencie...
متن کاملPitch-Adaptive Front-End Features for Robust Children's ASR
In the presented work, we explore some of the challenges in recognizing children’s speech on automatic speech recognition (ASR) systems developed using adults’ speech. In such mismatched ASR tasks, a severely degraded recognition performance is observed due to the gross mismatch in the acoustic attributes between those two groups of speakers. Among the various sources of mismatch, we focus on t...
متن کاملA Study on the Effect of Pitch on LPCC and PLPC Features for Children's ASR in Comparison to MFCC
In this work, following our previous studies, we study and quantify the effect of pitch on LPCC and PLPC features and explore their efficacy for children’s mismatched ASR in comparison to MFCC. Our analysis shows that, unlike MFCC, LPCC feature has no major influence of pitch variations. On the other hand, similar to MFCC, though PLPC is also found to be significantly effected by pitch variatio...
متن کاملInvestigating recognition of children's speech
In this work recognition of children’s speech was investigated by considering a phone recognition task. Two baseline systems were trained, one for children and one for adults, by exploiting two Italian speech databases. Under matching conditions, training and recognition performed with data from the same population group, the phone recognition accuracy was 77.30% and 79.43% for children and adu...
متن کاملOn the development of matched and mismatched Italian children's speech recognition systems
While at least read speech corpora are available for Italian children’s speech research, there exist many languages which completely lack children’s speech corpora. We propose that learning statistical mappings between the adult and child acoustic space using existing adult/children corpora may provide a future direction for generating children’s models for such data deficient languages. In thi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010